11,232 research outputs found
On statistics, computation and scalability
How should statistical procedures be designed so as to be scalable
computationally to the massive datasets that are increasingly the norm? When
coupled with the requirement that an answer to an inferential question be
delivered within a certain time budget, this question has significant
repercussions for the field of statistics. With the goal of identifying
"time-data tradeoffs," we investigate some of the statistical consequences of
computational perspectives on scability, in particular divide-and-conquer
methodology and hierarchies of convex relaxations.Comment: Published in at http://dx.doi.org/10.3150/12-BEJSP17 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Leo Breiman
Statistics is a uniquely difficult field to convey to the uninitiated. It
sits astride the abstract and the concrete, the theoretical and the applied. It
has a mathematical flavor and yet it is not simply a branch of mathematics. Its
core problems blend into those of the disciplines that probe into the nature of
intelligence and thought, in particular philosophy, psychology and artificial
intelligence. Debates over foundational issues have waxed and waned, but the
field has not yet arrived at a single foundational perspective.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS387 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bayesian inference for queueing networks and modeling of internet services
Modern Internet services, such as those at Google, Yahoo!, and Amazon, handle
billions of requests per day on clusters of thousands of computers. Because
these services operate under strict performance requirements, a statistical
understanding of their performance is of great practical interest. Such
services are modeled by networks of queues, where each queue models one of the
computers in the system. A key challenge is that the data are incomplete,
because recording detailed information about every request to a heavily used
system can require unacceptable overhead. In this paper we develop a Bayesian
perspective on queueing models in which the arrival and departure times that
are not observed are treated as latent variables. Underlying this viewpoint is
the observation that a queueing model defines a deterministic transformation
between the data and a set of independent variables called the service times.
With this viewpoint in hand, we sample from the posterior distribution over
missing data and model parameters using Markov chain Monte Carlo. We evaluate
our framework on data from a benchmark Web application. We also present a
simple technique for selection among nested queueing models. We are unaware of
any previous work that considers inference in networks of queues in the
presence of missing data.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS392 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Probabilistic Inference in Queueing Networks
Although queueing models have long been used to model the performance of computer systems, they are out of favor with practitioners, because they have a reputation for requiring unrealistic distributional assumptions. In fact, these distributional assumptions are used mainly to facilitate analytic approximations such as asymptotics and large-deviations bounds. In this paper, we analyze queueing networks from the probabilistic modeling perspective, applying inference methods from graphical models that afford significantly more modeling flexibility. In particular, we present a Gibbs sampler and stochastic EM algorithm for networks of M/M/1 FIFO queues. As an application of this technique, we localize performance problems in distributed systems from incomplete system trace data. On both synthetic networks and an actual distributed Web application, the model accurately recovers the system’s service time using 1 % of the available trace data.
Adiabatic optimization without local minima
Several previous works have investigated the circumstances under which
quantum adiabatic optimization algorithms can tunnel out of local energy minima
that trap simulated annealing or other classical local search algorithms. Here
we investigate the even more basic question of whether adiabatic optimization
algorithms always succeed in polynomial time for trivial optimization problems
in which there are no local energy minima other than the global minimum.
Surprisingly, we find a counterexample in which the potential is a single basin
on a graph, but the eigenvalue gap is exponentially small as a function of the
number of vertices. In this counterexample, the ground state wavefunction
consists of two "lobes" separated by a region of exponentially small amplitude.
Conversely, we prove if the ground state wavefunction is single-peaked then the
eigenvalue gap scales at worst as one over the square of the number of
vertices.Comment: 20 pages, 1 figure. Journal versio
- …